Appsheet: Efficient use of web workers to support decision making
نویسندگان
چکیده
The wealth of information and social resources online has raised the bar for the quality of decisions that individuals and businesses can make. Human computation and social mediums have also increased the potential for finding relevant information or opinions and making them a part of a decision-making process. However, the strategies that individuals employ when confronted with too much information—satisficing, information foraging, etc.—are more difficult to apply with a large, distributed group. Appsheet is a new technology foundation that uses a spreadsheet model of a decision to guide distributed search parties in support of decision-making applications. INTRODUCTION The explosive growth in availability of information online has opened a bounty of resources for users making plans and strategic decisions. Users can turn to web pages and online databases for questions requiring objective information. Subjective questions can often be answered by a post on a social network or discussion forum, or even a direct email or chat message to a trusted colleague. The challenge comes when trying to leverage these mediums for more complex problems that require more thorough analysis. While it may be easy to find an answer to a specific question, a solution to the overarching problem may be more elusive. Consider the following questions: Where should I buy groceries this week to save money? If grocery stores made all of their prices freely available as a downloadable price list or through an API, then customers could save money by comparing the total cost of the items on their shopping list at each store. Where/when can I vacation to balance cost vs. preference? If every airline made their best fares available in a form that could be imported into a spreadsheet, then travelers could assign a preference weight to each destination, departure date, and return date they were considering, and balance with the costs to find the best itinerary for each situation. What graduate programs should I apply to? If all relevant details about every program were readily available in one place, we could easily identify the ones that best fit our personal criteria. Which of the 6,000 paper submissions should be accepted? If reviewers had unlimited time, every paper could be read by every relevant reviewer, ensuring greater consistency. Ideally we would like answers that account for the specific needs of each individual or situation and make use of all available information. In reality, time to gather the information is limited, so we compromise. We accept a suboptimal decision based on partial information or take the first option that meets some baseline standard. This behavior by individual decision-makers, termed satisficing by Simon [36], results from the inherent tradeoff between the cost of deciding and the benefit of a well-reasoned decision that utilizes as much information as possible. The rapid development of online micro-task markets, such as Amazon Mechanical Turk, presents options that did not previously exist. Now, one can easily hire web workers to gather information, paying a small amount of money for a small amount of work. For example, Smartsheet is a commercial service that facilitates the process of hiring workers to search for data [17]. Options have also emerged for more subjective matters. Question-answer forums (i.e., Yahoo Answers [21]) and social network site features (i.e., Facebook Questions [15]) allow users to solicit answers to questions in cases where a public discussion might yield a better result. To receive an answer, the question should not be too complex, and it should be expressed concisely, so that it not require much effort to respond to [13]. For other problems demanding more privacy or control over who provides the answers, tasks can be posted to an internal messaging system. A limitation of such distributed approaches is that they follow a static workflow. Requests are sent and then the requester waits until all results have been received. In contrast, individual information foraging tends to follow a dynamic workflow [32]. Users seek out the information that will have the greatest impact on the final result, constantly adjusting the strategy based on the information seen so far. Current crowdsourcing systems offer no such facility. Appsheet is a research prototype we built to demonstrate a new method of coordinating workers to support decision making tasks. To reduce human effort wasted gathering unnecessary information, it leverages a model of the decision provided by the user in the form of a spreadsheet. To start, the user (someone with a decision to make) creates an empty spreadsheet with formulas that will calculate the end result. In cells where information is needed, the user enters a special ASK(...) formula. The parameters to this formula communicate to the Appsheet server that some information is needed and the range of values that is expected. Appsheet analyzes the formulas and sends requests to the helpers specified by the user. Using a value of information analysis, the system eliminates any ASK formulas that will not affect the final result. Since humans are generally much slower and more expensive than CPUs, the system aggressively prioritizes the inputs to minimize unnecessary requests. Appsheet is primarily intended for problems that are oriented toward gathering a lot of details and then selecting a subset of them, typically based on the relation to the rest. This includes the examples above and, more generally, any problem that can be modeled using operations such as minimum, maximum, median, top-n, nth-largest, pick-any, or if/else. Since expected bounds of the inputs are specified by the user a priori, such operations can often be precisely computed with only partial information. Appsheet is not intended for needle-in-a-haystack search problems (i.e., find Mayor X’s salary). Also excluded are problems where a large set of information is gathered and all of it is to be used (i.e., find salaries of all mayors to publish a complete list in a newspaper). The key contribution of this paper is a technical foundation for coordinating and optimizing human data gathering processes in support of decision making. NEED FOR RESEARCH One of the primary ways Appsheet helps users make decisions is by coordinating many participants to gather the requisite information. This can be thought of as a form of collaborative search. Collaborative search has been well-explored for small groups of searchers, both co-located and remote [3, 29, 30]. Typically, the collaborators are working toward a common goal that is important to both. Appsheet is different in that the initiative comes primarily from the person who initiates the process. The human helpers may or may not also be stakeholders in the problem. Also, Appsheet aims to support much larger, more distributed, and possibly anonymous groups working together. Crowdsourcing search has demonstrated some limited commercial viability by ChaCha, a company that hires remote workers to interpret users’ search terms and return high quality search results in real time [9, 23]. Smartsheet, mentioned earlier, is the most similar project to Appsheet. Paid web workers enter information to fill a spreadsheetlike interface [17]. However, Smartsheet is only for collecting monolithic lists of information. It does not allow for collecting a cell of data at a time, and there is no support for goal-directed control based on the formulas and information obtained. In contrast, Appsheet supports requesting data one cell at a time, and it uses the formulas in the model to dynamically decide which information to collect next. Aardvark was a commercial service that, until 2011, allowed a person to enter a question into an instant messaging client and get a response from another user of the service who was expected to be knowledgeable about the topic of the question [23]. This was good for simple questions where one person might reasonably be expected to know the answer. It did not help with more complex tasks where information needed to be gathered from different sources. However, the mode of communicating with helpers was novel. We envision using something similar with Appsheet in the future to reduce the delay between when a request is sent and when it is answered. While interrupting helpers with requests in chat windows might be considered too intrusive in many situations, we suppose that among a workgroup that is already working actively on the problem, it might be acceptable and yield productivity gains sufficient to warrant the intrusions. Freebase is a commercial project that has hired workers to enter information into a very large encyclopedic ontology [6]. Crowdsourcing has also been used to gather data to make traditional search engines better [22, 25]. These efforts are aimed at producing large databases for future use, as opposed to solving a single problem, as in the case of Appsheet. However, they are similar in that they coordinate web workers to gather information from the web and organize it in a manner that supports the application. Several efforts have been made to develop easy programming models for directing web workers. TurKit is a programming system for using Mechanical Turk. It allows programmers to write programs in JavaScript as if all requests to workers were handled immediately, essentially factoring out the usual lag in waiting for results [24]. Whereas TurKit is more generally applicable (beyond decision-making), Appsheet uses far more aggressive methods for optimizing evaluation in order to make more efficient use of the human effort. CrowdLang is another programming system, which aims to better support the task of building systems that interact with crowd labor, often in conjunction with complementary machine resources [28]. Like Appsheet, both of these support the use of a programming language to describe a dynamic workflow that web workers will follow. Note that spreadsheet formulas are considered a first-order functional programming language [1]. Functional programming is a paradigm in programming languages that emphasizes data flow through mathematical functions instead of variables and mutable data structures. As evidence of the versatility of spreadsheet formulas as a programming language, Casamir showed how many elementary functional programming exercises (i.e., towers of Hanoi, Fibonacci sequences, generation of permutations, etc.) can actually be accomplished using clever spreadsheet formulas [8]. Appsheet could have been built around a different programming language. We briefly considered using a variant of JavaScript, Python, or Scala. However, from the perspective of optimization, spreadsheets have the advantage that many semantic properties, such as the order of evaluating parameters and operands, are not well defined. For example, whereas in most languages, evaluation of the expression X AND Y always proceeds left-to-right, the creators of spreadsheets make no such guarantee. Appsheet actively exploits these ambiguities by actively choosing the order that will minimize human effort in the overall workflow. A simpler “language” for specifying crowdsourced workflows is the so-called human macro found in Soylent [4]. To use this feature, a user of Microsoft Word describes in natural language some text manipulations to be done. Then, paid web workers read the instructions and perform the manipulations. Even simpler yet, VizWiz allows blind users to take a photo using a smartphone and then speak a question about the photo in natural language; answers are given by a paid web worker [5]. Like Appsheet, both of these projects use paid web workers to provide crucial functionality in an end-user application. Whereas Appsheet is essentially applying a spreadsheet interface to working with crowdsourcing channels, Qurk and CrowdDB have applied a database interface. Both projects allow a user to write SQL-like queries to be answered by web workers, even including sort and join operations [16,26,27]. Although the objective and mechanisms are very different from Appsheet, these are similar in that they use a goal-directed process to generate requests to paid web workers. In addition to the languages mentioned above, some projects have focused on supporting complex tasks using crowdsourcing, typically using some sort of divide and conquer strategy [19, 20]. Like Appsheet, these projects use a specification of the end goal provided by the user, and dynamically decide what tasks to issue to workers. At the heart of Appsheet and most of the aforementioned projects is the basic idea of using software to direct humans to perform a process according to an explicitly defined algorithm. This is known as human computation [33]. Appsheet is a specific instance of human computation that is goal-driven, using a model. Human computation is closely intertwined with artificial intelligence. In fact, human computation is sometimes called artificial artificial intelligence [2, 37]. Dai et al have used artificial intelligence methods to achieve specific quality levels from untrusted paid web workers, by using a decision-theoretic optimization model to balance result quality with the cost of obtaining the result [12]. That work assumes that all requests must be fulfilled. In contrast, Appsheet sidesteps the issue of quality and focuses instead on achieving a solution to a high level decision problem with as few requests as possible. The value of information analysis it uses is rooted in artificial intelligence methods for decision-making [35]. Within machine learning, active learning is a technique whereby computer estimates of task difficulty are used to actively solicit new training data from humans in order to improve the accuracy of a classifier [38]. On the surface, this might appear to be similar to Appsheet, since it involves makes requests using algorithms that aim to achieve a goal while minimizing the burden on the human helpers. The main difference is that active learning creates a predictive model (classifier), whereas Appsheet helps a particular user achieve a single goal in a prescribed way according to the formulas in that user’s decision model. Appsheet may be used as a means of group decision support [31], but that is not the primary focus. Whereas typical group decision support systems are geared toward situations where all participants are stakeholders to some degree, the premise of Appsheet is that one user, the person who provides the model, owns the process and is the primary stakeholder. The helpers might have an interest in the outcome, but that is not an assumption. Also, unlike typical group decision support systems, Appsheet does not explicitly support brainstorming or voting. One of the advantages of Appsheet is that it allows one to benefit from many information sources, rather than just a single search engine or a single database. Similarly, the Liquid Query paradigm aims to support queries that span multiple domains that might otherwise be accessible only by multiple search services, using an SQL-like syntax [7]. The queries would be handled automatically, without help from web workers or other such human helpers. A key characteristic of the problem domain of Appsheet in the context of search is that the tasks involve performing a series of many searches in a systematic, pre-prescribed way. Search Pad is a research prototype that was designed to support such search missions by automatically detecting such searches that appear to be inter-related and offering interface support for note-taking [14]. In the case of Appsheet, the role of note-taking is handled implicitly by the spreadsheet model that collects the results. Sensemaking is a model that describes a broader class of complex search behavior in terms of the costs of searching and processing the information found [34]. It models the process of an individual developing an understanding of a topic, rather than pursuing a single, limited objective, as in the case of Appsheet. Information scent is a closely related concept which models the subjective value of a resource, as perceived by the information-seeker, along with the cost of accessing it [10]. Because it essentially balances the value of the information with the cost of acquisition, it is a closer analogy for what Appsheet does in a distributed way. APPSHEET With Appsheet, the user creates a spreadsheet to match the specifics of their particular task. Appsheet is implemented as an extension to the spreadsheet application. The use of Appsheet can best be explained by an example. Andrew operates a catering business that needs to buy a substantial amount of food each week—typically about 50 items totaling about $1,000. Depending on the week, any of the five grocery stores nearby might have some of the items on sale. All of the grocery stores make their weekly flyers available online in PDF format, but there is no API or central database of current prices for groceries in a particular area. Therefore, Andrew will hire web workers to search through the flyers and look for sale prices on the items so he can get the best deal possible at a single store. The process starts with a blank spreadsheet (Figure 1a). Next, he fills in the needed ingredients, the names of the stores, and formulas for the totals at the bottom (Figure 1b). For compactness, we will abbreviate the scenario to only 5 items and 3 stores. Cell C7 calculates the sum of the prices at Store A.
منابع مشابه
Medical Informatics: Concepts and Applications
Medical Informatics is a developing body of knowledge concerned with the use of information and communication technology in support of medical research, education and also for promoting health care delivery. The field focuses on the biomedical information, patient data, and also acquisition, storage, retrieval and optimal use of information for problem solving and decision making. The goal of m...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملA Scoping Review on Interventions for Retention of Healthcare Workers in Epidemic Disasters
Background and Objectives: The health sector will face a shortage of manpower during crises. The sustainability and retention of human resources during these conditions are vital. The purpose of this study was to explain possible policies and strategies to strengthen health workers during the crisis and prevent them from leaving the organizations and hospitals. Methods: This was a scoping ...
متن کاملMaking Decision Support System for Utilization of Biogas in Iran
The use of renewable energy sources is often suggested to be a good solution for climate change and the dependency to fossil fuel. Biogas utilization is a one of these promising options that can mitigate these problems since biogas is produced by the fermentation of waste, so is rich in methane and has the same characteristics as natural gas. Biogas has increasingly been noticed in different co...
متن کاملClinical decision making in Iranian nurses: systematic review
Introduction: Clinical decision making is one of the most important processes which nurses always use to care for patients. Appropriate decisions help to improve the quality of care, reduce the duration of illness and disability, reduce costs and make optimal use of resources. Therefore, the purpose of this research was to review studies conducted in the field of clinical decision making of Ira...
متن کامل